Exploring EVENTS

Screen%20Shot%202022-01-30%20at%2011.11.04.png

Experiments

    1. Visualising Events Dataframe
    1. Exploring Tags Events
    1. Calculating Events Description Similarity
    1. Calculating Events Description Topic Modelling
    1. Exploring the Schedules of Events
      • 5.1 Getting the Frequency of Starting Dates of Events Schedules
      • 5.2 Getting the Frequency of End Dates of Events Schedules
    1. Exploring the Performances Tickets of Events Schedules
      • 6.1 Getting the Frequency of Price Tickets
      • 6.2 Getting the frequency of type (Standard, Children) tickets
      • 6.3 Exploring Performances Places - ATENTION: Merging information with "places" dataframe!
        • 6.3.1 Frequency of Performances per town
        • 6.3.2 Frequency of Type tickets per town
        • 6.3.3 Frequency of Price tickets type per town
        • 6.3.4 Frequency of Max_Price tickets per town
          • 6.3.4.1 Frequency of Free tickets per town
          • 6.3.4.2 Frequency of No Free tickets per town
      • 6.4 Selecting Scottish Cities: Edinburgh, Glasgow, Dundee, Perth, Inverness, Aberdeen, St Andrews
        • 6.4.1 Frequency of Price Tickets per Scottish City
        • 6.4.2 Frequency of Type Tickets per Scottish City
        • 6.4.3 Frequency of Schedules Dates per Event and per Scottish City
        • 6.4.4.Grouping Schedules per Event and Scottish City
        • 6.4.5 Exploring Tags per Schedule and Scottish Cities
          • 6.4.5.1 Exploring the Frequency of schedules tags for Edinburgh
          • 6.4.5.2 Exploring the Frequency of schedules tags for Glasgow
        • 6.4.6 Histograms of starting/end schedules dates for Edinburgh
        • 6.4.7 Working with Schedule tags, Scottish cities, Starting/End Time
          • 6.4.7.1 Frequency of schedules Starting Date in Scottish City
          • 6.4.7.2 Frequency of schedules Ending Date in Scottish City
          • 6.4.7.3 Scheduled tags and Starting Dates in Scottish City
          • 6.4.7.4 Scheduled tags and Starting Dates in Scottish City

0. Importing libraries and loading the json file with 5000 events to a dataframe

In [132]:
import json
import pandas as pd
import plotly.express as px
import os
from bertopic import BERTopic
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import pickle
import plotly.graph_objects as go
import numpy as np
from gensim.parsing.preprocessing import remove_stopwords
import re
In [2]:
with open('dataset/sample_20180501.json', 'r') as f:
    data = json.load(f)
    print(len(data["events"]))
    events=data["events"]
df = pd.DataFrame(events)
10097

1. Visualizing the events dataframe

In [129]:
df
Out[129]:
event_id modified_ts created_ts event_name sort_name status id schedules descriptions website event_tags category properties ranking_level ranking_in_level alternative_names phone_numbers
0 194419 2021-11-13T01:03:47Z 2010-01-25T14:51:46Z Martin Simpson Martin Simpson live 194419 [{'start_ts': '2018-05-07T20:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://www.martinsimpson.com/ [Blues, Folk, Folk & world, Jazz, Music] Music {'list:website:comments-end-date': '2020-01-28... 2 1 NaN NaN
1 345866 2019-06-09T12:51:49Z 2013-03-01T10:33:08Z Red Raw Red Raw live 345866 [{'start_ts': '2018-05-07T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... NaN [Comedy, Red Raw, Stand-up] Comedy {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN
2 347164 2020-03-23T07:05:08Z 2013-03-18T13:05:44Z The Saturday Show Saturday Show live 347164 [{'start_ts': '2018-05-05T21:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... NaN [Comedy, Stand-up, The Saturday Show] Comedy {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN
3 347313 2020-03-24T07:05:11Z 2013-03-21T12:44:28Z The Sunday Night Laugh-In Sunday Night Laugh-In live 347313 [{'start_ts': '2018-06-03T20:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... NaN [Comedy, Stand-up, Sunday Night Laugh-In] Comedy {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN
4 387089 2021-09-25T01:53:13Z 2014-01-21T12:29:58Z Katherine Ryan: Glitter Room Katherine Ryan: Glitter Room live 387089 [{'start_ts': '2018-08-16T20:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://www.katherineryan.co.uk [Comedy, Stand-Up, Stand-up] Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
10092 1115077 2018-10-09T02:43:01Z 2018-10-09T02:43:01Z The Whitney Houston Experience And The Motown ... Whitney Houston Experience And The Motown Brot... live 1115077 [{'start_ts': '2018-10-12T20:00:00+01:00', 'en... [{'type': 'description.official', 'description... NaN [Music] Music [] 3 2 NaN NaN
10093 1118842 2018-10-12T11:08:06Z 2018-10-12T09:34:50Z Aberlady Goose Walk Aberlady Goose Walk live 1118842 [{'start_ts': '2018-10-13T15:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... NaN [Days out, Nature, Walks, Wildlife] Days out {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN
10094 1121581 2018-10-16T02:16:15Z 2018-10-16T02:16:15Z Tokyo NIghtclub Presents HalloScream Tokyo NIghtclub Presents HalloScream live 1121581 [{'start_ts': '2018-10-27T22:30:00+01:00', 'en... [{'type': 'description.official', 'description... NaN [Clubs, Cheesy Dance, EDM, Pop] Clubs [] 3 2 NaN NaN
10095 947564 2019-01-23T10:35:10Z 2019-01-23T10:35:10Z Pamela's Palace Pamela's Palace live 947564 [{'start_ts': '2018-08-02T21:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://www.interactivetheatre.com.au [Comedy, Theatre] Comedy {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN
10096 961311 2018-11-29T17:04:14Z 2018-03-22T20:00:16Z Signor Baffo's Restaurant Signor Baffo's Restaurant live 961311 [{'start_ts': '2018-08-02T11:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... NaN [Kids] Kids [] 3 2 NaN NaN

10097 rows × 17 columns

In [4]:
## selecting some columns

Experiment 2: Exploring Tags Events

We are going to separete the elements stored in each tag list into new rows.

In [5]:
df["tags"][0:5]
Out[5]:
0     [Blues, Folk, Folk & world, Jazz, Music]
1                  [Comedy, Red Raw, Stand-up]
2        [Comedy, Stand-up, The Saturday Show]
3    [Comedy, Stand-up, Sunday Night Laugh-In]
4                 [Comedy, Stand-Up, Stand-up]
Name: tags, dtype: object
In [6]:
df_tags=df.explode('tags')
In [7]:
df_tags
Out[7]:
event_id modified_ts created_ts name sort_name status id schedules descriptions website tags category properties ranking_level ranking_in_level alternative_names phone_numbers
0 194419 2021-11-13T01:03:47Z 2010-01-25T14:51:46Z Martin Simpson Martin Simpson live 194419 [{'start_ts': '2018-05-07T20:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://www.martinsimpson.com/ Blues Music {'list:website:comments-end-date': '2020-01-28... 2 1 NaN NaN
0 194419 2021-11-13T01:03:47Z 2010-01-25T14:51:46Z Martin Simpson Martin Simpson live 194419 [{'start_ts': '2018-05-07T20:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://www.martinsimpson.com/ Folk Music {'list:website:comments-end-date': '2020-01-28... 2 1 NaN NaN
0 194419 2021-11-13T01:03:47Z 2010-01-25T14:51:46Z Martin Simpson Martin Simpson live 194419 [{'start_ts': '2018-05-07T20:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://www.martinsimpson.com/ Folk & world Music {'list:website:comments-end-date': '2020-01-28... 2 1 NaN NaN
0 194419 2021-11-13T01:03:47Z 2010-01-25T14:51:46Z Martin Simpson Martin Simpson live 194419 [{'start_ts': '2018-05-07T20:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://www.martinsimpson.com/ Jazz Music {'list:website:comments-end-date': '2020-01-28... 2 1 NaN NaN
0 194419 2021-11-13T01:03:47Z 2010-01-25T14:51:46Z Martin Simpson Martin Simpson live 194419 [{'start_ts': '2018-05-07T20:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://www.martinsimpson.com/ Music Music {'list:website:comments-end-date': '2020-01-28... 2 1 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
10094 1121581 2018-10-16T02:16:15Z 2018-10-16T02:16:15Z Tokyo NIghtclub Presents HalloScream Tokyo NIghtclub Presents HalloScream live 1121581 [{'start_ts': '2018-10-27T22:30:00+01:00', 'en... [{'type': 'description.official', 'description... NaN EDM Clubs [] 3 2 NaN NaN
10094 1121581 2018-10-16T02:16:15Z 2018-10-16T02:16:15Z Tokyo NIghtclub Presents HalloScream Tokyo NIghtclub Presents HalloScream live 1121581 [{'start_ts': '2018-10-27T22:30:00+01:00', 'en... [{'type': 'description.official', 'description... NaN Pop Clubs [] 3 2 NaN NaN
10095 947564 2019-01-23T10:35:10Z 2019-01-23T10:35:10Z Pamela's Palace Pamela's Palace live 947564 [{'start_ts': '2018-08-02T21:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://www.interactivetheatre.com.au Comedy Comedy {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN
10095 947564 2019-01-23T10:35:10Z 2019-01-23T10:35:10Z Pamela's Palace Pamela's Palace live 947564 [{'start_ts': '2018-08-02T21:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://www.interactivetheatre.com.au Theatre Comedy {'dropin_event': False, 'booking_essential': F... 3 2 NaN NaN
10096 961311 2018-11-29T17:04:14Z 2018-03-22T20:00:16Z Signor Baffo's Restaurant Signor Baffo's Restaurant live 961311 [{'start_ts': '2018-08-02T11:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... NaN Kids Kids [] 3 2 NaN NaN

24706 rows × 17 columns

In [8]:
g_tags=df_tags.groupby(['tags']).size().reset_index()
g_tags=g_tags.rename(columns={0: "number_of_times"}).sort_values(by=['number_of_times'], ascending=False)
g_tags
Out[8]:
tags number_of_times
201 Comedy 2290
587 Music 2131
917 Theatre 1891
121 Books 1278
248 Days out 1203
... ... ...
467 Industrial rock 1
468 Innerleithen Music Festival 1
472 Irish 1
473 Irish music 1
1051 workshops 1

1052 rows × 2 columns

In [9]:
fig = px.line(g_tags, x="tags", y="number_of_times", title='Number of times that each tag appears')
fig.show()

Experiment 3: Description Similarity

Exploding the column description

Given a description cell, with a list of descriptions, we will create new row per element in that list.

In [10]:
df["descriptions"][0:5]
Out[10]:
0    [{'type': 'description.list.default', 'descrip...
1    [{'type': 'description.list.default', 'descrip...
2    [{'type': 'description.list.default', 'descrip...
3    [{'type': 'description.list.default', 'descrip...
4    [{'type': 'description.list.default', 'descrip...
Name: descriptions, dtype: object
In [11]:
df_descriptions=df.explode('descriptions')
In [12]:
df_d=pd.concat([df_descriptions.drop(['descriptions'], axis=1), df_descriptions['descriptions'].apply(pd.Series)], axis=1)
In [13]:
df_desc=df_d[["event_id", "description"]]
In [14]:
df_desc
Out[14]:
event_id description
0 194419 Brilliant mix of English tradition and America...
0 194419 Nominated for Musician of the Year and for Bes...
1 345866 The Stand's spankingly good new talent night, ...
1 345866 Our long-running weekly beginner's showcase is...
2 347164 Saturday nights à la Stand are normally a sold...
... ... ...
10094 1121581 HalloScream is coming and you are invited! Tok...
10094 1121581 HalloScream is coming and you are invited! Tok...
10095 947564 Bittersweet comedy celebrating the vulnerabili...
10095 947564 Pamela is the hard won queen of a buzzing beau...
10096 961311 Interactive Theatre International and OH! Prod...

15120 rows × 2 columns

Finding similar descriptions events - Deep Learning - Transformers

In [15]:
# remving the rows which description is empty
df_desc1=df_desc.dropna(subset=['description']).reset_index()
In [16]:
df_desc1[0:5]
Out[16]:
index event_id description
0 0 194419 Brilliant mix of English tradition and America...
1 0 194419 Nominated for Musician of the Year and for Bes...
2 1 345866 The Stand's spankingly good new talent night, ...
3 1 345866 Our long-running weekly beginner's showcase is...
4 2 347164 Saturday nights à la Stand are normally a sold...
In [17]:
# total number of rows with descriptions
df_desc1.shape[0]
Out[17]:
15084
In [18]:
#selecting the description colum
documents=df_desc1["description"].values
In [130]:
#d=documents[0:100]
#d=documents[:]
In [133]:
def clean_documents(text):
    text = re.sub(r'\S*@\S*\s?', '', text, flags=re.MULTILINE) # remove email
    text = re.sub(r'http\S+', '', text, flags=re.MULTILINE) # remove web addresses
    text = re.sub("\'", "", text) # remove single quotes
    text = remove_stopwords(text)
    return text

We are going to save clean documents in d

In [135]:
d=[]
for text in documents:
    d.append(clean_documents(text))
     
In [137]:
# Using all-MiniLM-L6-v2 Transformer
model = SentenceTransformer('all-MiniLM-L6-v2')
In [138]:
#Training our text_embeddings - using the descriptions available & all-MiniLM-L6-v2 Transformer
text_embeddings = model.encode(d, batch_size = 8, show_progress_bar = True)

In [139]:
np.shape(text_embeddings)
Out[139]:
(15084, 384)
In [140]:
### A small example how to get an embedding vector from a description
In [141]:
first_description=df_desc1["description"].iloc[0]
first_description
first_description_embedding= model.encode(first_description, batch_size = 8, show_progress_bar = True)

Finding the similarity between documents

In [142]:
similarity_def=cosine_similarity(
    [first_description_embedding],
    text_embeddings)
In [143]:
similarities = cosine_similarity(text_embeddings)
print('pairwise dense output:\n {}\n'.format(similarities))
pairwise dense output:
 [[ 1.0000001   0.41011038  0.2373322  ...  0.16511843  0.22010157
   0.22398749]
 [ 0.41011038  1.0000002   0.18249829 ...  0.10893294  0.09748274
  -0.01556886]
 [ 0.2373322   0.18249829  1.         ...  0.30644706  0.26335353
   0.3044809 ]
 ...
 [ 0.16511843  0.10893294  0.30644706 ...  0.99999976  0.40886784
   0.2768666 ]
 [ 0.22010157  0.09748274  0.26335353 ...  0.40886784  1.0000001
   0.35396916]
 [ 0.22398749 -0.01556886  0.3044809  ...  0.2768666   0.35396916
   1.        ]]

In [144]:
similarities_sorted = similarities.argsort()
similarities_sorted
Out[144]:
array([[ 3681, 15072,  1312, ...,  8267,  3147,     0],
       [14691, 14690,  3586, ..., 14144, 14143,     1],
       [ 3345,  5177,  3444, ...,  8540,  1564,     2],
       ...,
       [12908,   522,  9920, ..., 12129, 13159, 15081],
       [ 3338, 12835,   522, ...,  5769,  6483, 15082],
       [ 3789,  8234,  8233, ...,  5840,   165, 15083]])
In [145]:
id_1 = []
id_2 = []
score = []
for index,array in enumerate(similarities_sorted):
    p=len(array)
    id_1.append(index)
    id_2.append(array[-2])
    score.append(similarities[index][array[-2]])
index_df = pd.DataFrame({'id_1' : id_1,
                          'id_2' : id_2,
                          'score' : score})
print(p)
15084
In [146]:
index_df
Out[146]:
id_1 id_2 score
0 0 3147 0.625151
1 1 14143 0.601930
2 2 1564 0.635598
3 3 8540 0.748229
4 4 13719 0.545395
... ... ... ...
15079 15079 15079 1.000000
15080 15080 15079 1.000000
15081 15081 13159 0.571361
15082 15082 6483 0.552497
15083 15083 165 0.574197

15084 rows × 3 columns

Exploring documents 3 and 8540 - similarity score: 0.747558

In [150]:
documents[3]
Out[150]:
"Our long-running weekly beginner's showcase is regarded as the best open mic night in the UK. Catch up to ten new acts – some treading the boards for the very first time. This is where everyone starts and it's your chance to see the stars of tomorrow today.  Watch out for older hands dropping in to try out new material too."
In [148]:
documents[8540]
Out[148]:
"The Stand Comedy Club\n\nFrom Edinburgh to Glasgow to Newcastle – our long-running weekly beginner's showcase is regarded as the best open mic night in the UK. A sell-out every week in our clubs (with a minimum six month stage time waiting list) – it's not to be missed! We bring together some of our favourites for the Fringe. Watch out for rising stars trying new material too. This is where everyone starts and it’s your chance to see the stars of tomorrow, today. Different line-up nightly!\n\nAll seating is unreserved and latecomers may not be admitted.\r\n"

Finding the first 10 similar definitions given the document 4

In [151]:
## Lets take the document 4
doc_index =3
documents[3]
Out[151]:
"Our long-running weekly beginner's showcase is regarded as the best open mic night in the UK. Catch up to ten new acts – some treading the boards for the very first time. This is where everyone starts and it's your chance to see the stars of tomorrow today.  Watch out for older hands dropping in to try out new material too."
In [152]:
results={}
for i in range(-2, -12, -1):
    similar_index=similarities_sorted[doc_index][i]
    rank=similarities[doc_index][similar_index]
    results[similar_index]=[rank]
In [153]:
results
Out[153]:
{8540: [0.74822885],
 2731: [0.7339019],
 15061: [0.650432],
 6897: [0.61143035],
 4984: [0.5936791],
 5416: [0.5624719],
 5766: [0.545264],
 5992: [0.5420095],
 7890: [0.54115],
 13244: [0.5370784]}

Experiment 4: Description Topic Modelling - Deep Learning - BERTopic

Lets find the topic modelling of our descriptions We are going to use the text_embeddings calculated in the previous phase.

In [154]:
len(documents)
Out[154]:
15084
In [156]:
topic_model = BERTopic(min_topic_size=20).fit(d, text_embeddings)
In [157]:
topics, probs = topic_model.transform(d, text_embeddings)

Visualizing our topics

In [158]:
topic_model.visualize_topics()
In [159]:
#### Visualzing the first 5 keywords of our first 5 topics
In [160]:
topic_model.visualize_barchart()

Visualizing the similarity between topics

In [161]:
topic_model.visualize_heatmap()

Getting the frequency of each topic.

We should always ignore the first -1 topic.

In [162]:
#Lets see the frequency of the first 10 topics
topic_model.get_topic_freq()[0:10]
Out[162]:
Topic Count
0 -1 4719
1 0 3140
2 1 2469
3 2 652
4 3 266
5 4 258
6 5 244
7 6 175
8 7 161
9 8 155
In [163]:
print("Number of topics found %s" %len(topic_model.get_topic_freq()))
Number of topics found 66

Visualizing the keywords of our topics.

In [164]:
#topic_model.get_topics()
In [174]:
document_0_topic=topics[0]
print("The topic of the document 0 is %s " %document_0_topic)
The topic of the document 0 is 0 
In [175]:
topic_model.get_topic(0)
Out[175]:
[('music', 0.015782324152882023),
 ('band', 0.013903156375026326),
 ('jazz', 0.011746250262168027),
 ('songs', 0.010532104667705754),
 ('album', 0.00948752679270438),
 ('blues', 0.008979971865858445),
 ('folk', 0.008784042842143132),
 ('guitar', 0.008119087259602874),
 ('concert', 0.007805208362342594),
 ('rock', 0.0071212721556192515)]
In [178]:
df_desc1["description"].iloc[0]
Out[178]:
'Brilliant mix of English tradition and American music from the virtuoso fingerstyle guitarist and singer, a former BBC Radio 2 Folk Musician of the Year.'

Experiment 5: Exploring the Schedules of Events

  • 1 Event can have 1 to N Schedules.
  • 1 Schedule is in 1 Place
  • 1 Schedule can have 1 to N Performances
  • 1 Peformance can have 1 to N Tickets
  • 1 Ticket has a max_price, min_price, currency.

Lets starting exploding the schedules column

In [50]:
df["schedules"]
Out[50]:
0        [{'start_ts': '2018-05-07T20:00:00+01:00', 'en...
1        [{'start_ts': '2018-05-07T20:30:00+01:00', 'en...
2        [{'start_ts': '2018-05-05T21:00:00+01:00', 'en...
3        [{'start_ts': '2018-06-03T20:30:00+01:00', 'en...
4        [{'start_ts': '2018-08-16T20:00:00+01:00', 'en...
                               ...                        
10092    [{'start_ts': '2018-10-12T20:00:00+01:00', 'en...
10093    [{'start_ts': '2018-10-13T15:30:00+01:00', 'en...
10094    [{'start_ts': '2018-10-27T22:30:00+01:00', 'en...
10095    [{'start_ts': '2018-08-02T21:00:00+01:00', 'en...
10096    [{'start_ts': '2018-08-02T11:00:00+01:00', 'en...
Name: schedules, Length: 10097, dtype: object
In [51]:
df_schedules=df
df_schedules.rename(columns={'tags':'event_tags'}, inplace=True)
df_schedules.rename(columns={'name':'event_name'}, inplace=True)
df_schedules.rename(columns={'links':'event_links'}, inplace=True)
df_schedules=df.explode('schedules')
#df_schedules
df_s=pd.concat([df_schedules.drop(['schedules'], axis=1), df_schedules['schedules'].apply(pd.Series)], axis=1)
In [52]:
df_s.iloc[0]
Out[52]:
event_id                                                        194419
modified_ts                                       2021-11-13T01:03:47Z
created_ts                                        2010-01-25T14:51:46Z
event_name                                              Martin Simpson
sort_name                                               Martin Simpson
status                                                            live
id                                                              194419
descriptions         [{'type': 'description.list.default', 'descrip...
website                                  http://www.martinsimpson.com/
event_tags                    [Blues, Folk, Folk & world, Jazz, Music]
category                                                         Music
properties           {'list:website:comments-end-date': '2020-01-28...
ranking_level                                                        2
ranking_in_level                                                     1
alternative_names                                                  NaN
phone_numbers                                                      NaN
start_ts                                     2018-05-07T20:00:00+01:00
end_ts                                       2018-05-07T20:00:00+01:00
place_id                                                           386
performances         [{'ts': '2018-05-07T20:00:00+01:00', 'links': ...
performance_space                                                  NaN
phone_numbers                                                      NaN
Name: 0, dtype: object

Getting the Frequency of Starting Dates of Events Schedules

In [53]:
df_start=df_s.groupby([pd.to_datetime(df_s['start_ts'])]).size().reset_index()
df_start=df_start.rename(columns={0: "number_of_times"})
df_start=df_start.sort_values(by=['number_of_times'], ascending=False)
df_start.reset_index()
Out[53]:
index start_ts number_of_times
0 4 2018-05-01 10:00:00+01:00 32
1 1701 2018-08-02 12:00:00+01:00 25
2 4393 2018-10-27 10:00:00+01:00 22
3 1757 2018-08-02 17:00:00+01:00 21
4 1768 2018-08-02 18:00:00+01:00 21
... ... ... ...
4482 1974 2018-08-03 21:55:00+01:00 1
4483 1981 2018-08-03 22:35:00+01:00 1
4484 1984 2018-08-03 22:50:00+01:00 1
4485 1986 2018-08-03 23:05:00+01:00 1
4486 4486 2018-10-31 23:00:00+00:00 1

4487 rows × 3 columns

This means that we have 426 events' schedules starting at 2022-01-27 19:00:00+00:00

In [54]:
#### Visualizing the previous Start_Ts Schedules Events Freq.
In [55]:
fig = px.histogram(df_start, x='start_ts', y="number_of_times", title="Frequency of Starts Dates Schedules")
fig.show()

Getting the Frequency of End Dates of Events Schedules

In [56]:
df_end=df_s.groupby([pd.to_datetime(df_s['end_ts'])]).size().reset_index()
df_end=df_end.rename(columns={0: "number_of_times"})
df_end=df_end.sort_values(by=['number_of_times'], ascending=False)
df_end.reset_index()
fig = px.histogram(df_end, x='end_ts', y="number_of_times", title="Frequency of End Dates Schedules")
fig.show()

Experiment 6: Exploring the Performances Tickets of Events Schedules

  • 1 Event can have 1 to N Schedules.
  • 1 Schedule is in 1 Place
  • 1 Schedule can have 1 to N Performances
  • 1 Peformance can have 1 to N Tickets
  • 1 Ticket has a max_price, min_price, currency.

Lets starting exploding the performance column. We can not explode the performance column, if we hadnt have exploded the schedules column before. For that reason, we are using df_s dataframe, which has already exploded the schedules column.

In [57]:
df_s
Out[57]:
event_id modified_ts created_ts event_name sort_name status id descriptions website event_tags ... ranking_level ranking_in_level alternative_names phone_numbers start_ts end_ts place_id performances performance_space phone_numbers
0 194419 2021-11-13T01:03:47Z 2010-01-25T14:51:46Z Martin Simpson Martin Simpson live 194419 [{'type': 'description.list.default', 'descrip... http://www.martinsimpson.com/ [Blues, Folk, Folk & world, Jazz, Music] ... 2 1 NaN NaN 2018-05-07T20:00:00+01:00 2018-05-07T20:00:00+01:00 386 [{'ts': '2018-05-07T20:00:00+01:00', 'links': ... NaN NaN
1 345866 2019-06-09T12:51:49Z 2013-03-01T10:33:08Z Red Raw Red Raw live 345866 [{'type': 'description.list.default', 'descrip... NaN [Comedy, Red Raw, Stand-up] ... 3 2 NaN NaN 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00 1 [{'ts': '2018-05-07T20:30:00+01:00', 'links': ... NaN NaN
2 347164 2020-03-23T07:05:08Z 2013-03-18T13:05:44Z The Saturday Show Saturday Show live 347164 [{'type': 'description.list.default', 'descrip... NaN [Comedy, Stand-up, The Saturday Show] ... 3 2 NaN NaN 2018-05-05T21:00:00+01:00 2018-10-27T20:30:00+01:00 1 [{'ts': '2018-05-05T21:00:00+01:00', 'links': ... NaN NaN
3 347313 2020-03-24T07:05:11Z 2013-03-21T12:44:28Z The Sunday Night Laugh-In Sunday Night Laugh-In live 347313 [{'type': 'description.list.default', 'descrip... NaN [Comedy, Stand-up, Sunday Night Laugh-In] ... 3 2 NaN NaN 2018-06-03T20:30:00+01:00 2018-10-14T20:30:00+01:00 1 [{'ts': '2018-06-03T20:30:00+01:00', 'links': ... NaN NaN
4 387089 2021-09-25T01:53:13Z 2014-01-21T12:29:58Z Katherine Ryan: Glitter Room Katherine Ryan: Glitter Room live 387089 [{'type': 'description.list.default', 'descrip... http://www.katherineryan.co.uk [Comedy, Stand-Up, Stand-up] ... 2 1 NaN NaN 2018-08-16T20:00:00+01:00 2018-08-16T20:00:00+01:00 385 [{'ts': '2018-08-16T20:00:00+01:00', 'links': ... NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
10092 1115077 2018-10-09T02:43:01Z 2018-10-09T02:43:01Z The Whitney Houston Experience And The Motown ... Whitney Houston Experience And The Motown Brot... live 1115077 [{'type': 'description.official', 'description... NaN [Music] ... 3 2 NaN NaN 2018-10-12T20:00:00+01:00 2018-10-12T20:00:00+01:00 112007 [{'ts': '2018-10-12T20:00:00+01:00', 'links': ... NaN NaN
10093 1118842 2018-10-12T11:08:06Z 2018-10-12T09:34:50Z Aberlady Goose Walk Aberlady Goose Walk live 1118842 [{'type': 'description.list.default', 'descrip... NaN [Days out, Nature, Walks, Wildlife] ... 3 2 NaN NaN 2018-10-13T15:30:00+01:00 2018-10-13T15:30:00+01:00 112611 [{'ts': '2018-10-13T15:30:00+01:00', 'duration... NaN NaN
10094 1121581 2018-10-16T02:16:15Z 2018-10-16T02:16:15Z Tokyo NIghtclub Presents HalloScream Tokyo NIghtclub Presents HalloScream live 1121581 [{'type': 'description.official', 'description... NaN [Clubs, Cheesy Dance, EDM, Pop] ... 3 2 NaN NaN 2018-10-27T22:30:00+01:00 2018-10-27T22:30:00+01:00 112723 [{'ts': '2018-10-27T22:30:00+01:00', 'duration... NaN NaN
10095 947564 2019-01-23T10:35:10Z 2019-01-23T10:35:10Z Pamela's Palace Pamela's Palace live 947564 [{'type': 'description.list.default', 'descrip... http://www.interactivetheatre.com.au [Comedy, Theatre] ... 3 2 NaN NaN 2018-08-02T21:00:00+01:00 2018-08-27T21:00:00+01:00 114042 [{'ts': '2018-08-02T21:00:00+01:00', 'duration... Hanover Suite NaN
10096 961311 2018-11-29T17:04:14Z 2018-03-22T20:00:16Z Signor Baffo's Restaurant Signor Baffo's Restaurant live 961311 [{'type': 'description.list.default', 'descrip... NaN [Kids] ... 3 2 NaN NaN 2018-08-02T11:00:00+01:00 2018-08-27T11:00:00+01:00 114042 [{'ts': '2018-08-02T11:00:00+01:00', 'duration... NaN NaN

11096 rows × 22 columns

In [58]:
a=df_s[["event_id", "event_name", "performances", "event_tags", "start_ts", "end_ts", "place_id"]]
df_p=a.explode("performances")
In [59]:
df_p
Out[59]:
event_id event_name performances event_tags start_ts end_ts place_id
0 194419 Martin Simpson {'ts': '2018-05-07T20:00:00+01:00', 'links': [... [Blues, Folk, Folk & world, Jazz, Music] 2018-05-07T20:00:00+01:00 2018-05-07T20:00:00+01:00 386
1 345866 Red Raw {'ts': '2018-05-07T20:30:00+01:00', 'links': [... [Comedy, Red Raw, Stand-up] 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00 1
1 345866 Red Raw {'ts': '2018-05-14T20:30:00+01:00', 'links': [... [Comedy, Red Raw, Stand-up] 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00 1
1 345866 Red Raw {'ts': '2018-05-21T20:30:00+01:00', 'links': [... [Comedy, Red Raw, Stand-up] 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00 1
1 345866 Red Raw {'ts': '2018-05-28T20:30:00+01:00', 'links': [... [Comedy, Red Raw, Stand-up] 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00 1
... ... ... ... ... ... ... ...
10096 961311 Signor Baffo's Restaurant {'ts': '2018-08-24T11:00:00+01:00', 'duration'... [Kids] 2018-08-02T11:00:00+01:00 2018-08-27T11:00:00+01:00 114042
10096 961311 Signor Baffo's Restaurant {'ts': '2018-08-25T11:00:00+01:00', 'duration'... [Kids] 2018-08-02T11:00:00+01:00 2018-08-27T11:00:00+01:00 114042
10096 961311 Signor Baffo's Restaurant {'ts': '2018-08-26T11:00:00+01:00', 'duration'... [Kids] 2018-08-02T11:00:00+01:00 2018-08-27T11:00:00+01:00 114042
10096 961311 Signor Baffo's Restaurant {'ts': '2018-08-27T11:00:00+01:00', 'duration'... [Kids] 2018-08-02T11:00:00+01:00 2018-08-27T11:00:00+01:00 114042
10096 961311 Signor Baffo's Restaurant {'ts': '2018-08-07T11:00:00+01:00', 'duration'... [Kids] 2018-08-02T11:00:00+01:00 2018-08-27T11:00:00+01:00 114042

97992 rows × 7 columns

In [60]:
df_p=pd.concat([df_p.drop(['performances'], axis=1), df_p['performances'].apply(pd.Series)], axis=1)
In [61]:
df_p[0:2]
Out[61]:
event_id event_name event_tags start_ts end_ts place_id ts links tickets descriptions duration properties time_unknown
0 194419 Martin Simpson [Blues, Folk, Folk & world, Jazz, Music] 2018-05-07T20:00:00+01:00 2018-05-07T20:00:00+01:00 386 2018-05-07T20:00:00+01:00 [{'type': 'booking', 'url': 'https://www.trave... [{'type': 'Standard', 'currency': 'GBP', 'min_... NaN NaN NaN NaN
1 345866 Red Raw [Comedy, Red Raw, Stand-up] 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00 1 2018-05-07T20:30:00+01:00 [{'type': 'booking', 'url': 'http://www.thesta... [{'type': 'Standard', 'currency': 'GBP', 'min_... [{'type': 'list.description.default', 'descrip... NaN NaN NaN

Exploring tickets

Now we have to explode the tickets column. We are going to remove the rows which tickets information is empty.

In [62]:
df_p=df_p.dropna(subset=['tickets'])

Since we dont need all the columns, we have selects a few of them.

In [63]:
df_t=df_p[["event_id", "event_name", "descriptions", "event_tags", "tickets", "place_id", "start_ts", "end_ts"]]
In [64]:
df_t[0:5]
Out[64]:
event_id event_name descriptions event_tags tickets place_id start_ts end_ts
0 194419 Martin Simpson NaN [Blues, Folk, Folk & world, Jazz, Music] [{'type': 'Standard', 'currency': 'GBP', 'min_... 386 2018-05-07T20:00:00+01:00 2018-05-07T20:00:00+01:00
1 345866 Red Raw [{'type': 'list.description.default', 'descrip... [Comedy, Red Raw, Stand-up] [{'type': 'Standard', 'currency': 'GBP', 'min_... 1 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00
1 345866 Red Raw NaN [Comedy, Red Raw, Stand-up] [{'type': 'Standard', 'currency': 'GBP', 'min_... 1 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00
1 345866 Red Raw [{'type': 'list.description.default', 'descrip... [Comedy, Red Raw, Stand-up] [{'type': 'Standard', 'currency': 'GBP', 'min_... 1 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00
1 345866 Red Raw [{'type': 'list.description.default', 'descrip... [Comedy, Red Raw, Stand-up] [{'type': 'Standard', 'currency': 'GBP', 'min_... 1 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00
In [65]:
df_t1=df_t.explode("tickets")

Now we are going to transform the max, and min prices of tickets to numeric values.

In [66]:
df_tickets=pd.concat([df_t1.drop(['tickets'], axis=1), df_t1['tickets'].apply(pd.Series)], axis=1)
df_tickets['min_price'] = pd.to_numeric(df_tickets['min_price'])
df_tickets['max_price'] = pd.to_numeric(df_tickets['max_price'])
df_tickets['min_price']= df_tickets['min_price'].fillna(0)
df_tickets['max_price']= df_tickets['max_price'].fillna(0)
In [67]:
df_tickets[0:5]
Out[67]:
event_id event_name descriptions event_tags place_id start_ts end_ts 0 currency description max_price min_price type
0 194419 Martin Simpson NaN [Blues, Folk, Folk & world, Jazz, Music] 386 2018-05-07T20:00:00+01:00 2018-05-07T20:00:00+01:00 NaN GBP NaN 0.0 11.0 Standard
1 345866 Red Raw [{'type': 'list.description.default', 'descrip... [Comedy, Red Raw, Stand-up] 1 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00 NaN GBP NaN 0.0 3.0 Standard
1 345866 Red Raw NaN [Comedy, Red Raw, Stand-up] 1 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00 NaN GBP NaN 0.0 3.0 Standard
1 345866 Red Raw [{'type': 'list.description.default', 'descrip... [Comedy, Red Raw, Stand-up] 1 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00 NaN GBP NaN 0.0 3.0 Standard
1 345866 Red Raw [{'type': 'list.description.default', 'descrip... [Comedy, Red Raw, Stand-up] 1 2018-05-07T20:30:00+01:00 2018-10-29T20:30:00+00:00 NaN GBP NaN 0.0 3.0 Standard

Experiment 6.1: Getting the Frequency of Price Tickets

We are working just with max_price.

In [68]:
g_maxp=df_tickets.groupby(['max_price']).size().reset_index()
g_maxp=g_maxp.rename(columns={0: "number_of_times"})
#g_maxp=g_maxp.sort_values(by=['number_of_times'], ascending=False)
free_tickets=g_maxp[0:1]
## Removing FREE TICKETS
g_maxp=g_maxp.drop([0])
### 
g_maxp[:]
Out[68]:
max_price number_of_times
1 1.50 2
2 2.00 2
3 3.00 5
4 4.00 1
5 4.50 5
... ... ...
196 240.00 1
197 340.00 25
198 420.00 30
199 550.00 1
200 1278.49 1

200 rows × 2 columns

In [69]:
fig = px.line(g_maxp, x="max_price", y="number_of_times", title='Frequency of price tickets')
fig.show()
In [70]:
print("The number of free tickets is: %s" %free_tickets["number_of_times"].values[0])
The number of free tickets is: 145936

Experiment 6.2: Getting the frequency of type (Standard, Children) tickets

In [71]:
tickets_type=df_tickets.groupby(['type']).size().reset_index()
tickets_type=tickets_type.rename(columns={0: "number_of_times"}).sort_values(by=['number_of_times'], ascending=False)
tickets_type
Out[71]:
type number_of_times
32 Standard 93836
14 Concession 38278
9 Children 7788
17 Family 3230
34 Students 1010
... ... ...
69 before midnight 1
70 children 1
35 Students & seniors 1
72 concert only 1
0 16--26 year olds 1

100 rows × 2 columns

In [72]:
px.histogram(tickets_type, x="type", y="number_of_times", histfunc="sum", color="type", title='Frequency of type tickets')

6.3 Exploring Performances Places

In [73]:
df_tickets["place_id"]
Out[73]:
0           386
1             1
1             1
1             1
1             1
          ...  
10096    114042
10096    114042
10096    114042
10096    114042
10096    114042
Name: place_id, Length: 149007, dtype: int64

Creating places dataframe

In [74]:
data="dataset/sample_20180501.json"
with open('dataset/sample_20180501.json', 'r') as f:
    data = json.load(f)
    print(len(data["places"]))
    places=data["places"]
df_places = pd.DataFrame(places)
1224
In [75]:
df_place = df_tickets.merge(df_places, on=['place_id','place_id'])
In [76]:
df_place.shape[0]
Out[76]:
149007

This means that we have 15836 performances places.

6.3.1 Frequency of Performances per Town

In [77]:
df_town=df_place.dropna(subset=['town'])
town=df_town.groupby(['town']).size().reset_index()
town=town.rename(columns={0: "number_of_times"})
town=town.drop([0])
In [78]:
town=town.sort_values(by=['number_of_times'], ascending=False)
town
Out[78]:
town number_of_times
45 Edinburgh 133877
121 St Andrews 2628
71 Kirkcaldy 960
66 Kelso 946
133 Wilkieston 944
... ... ...
110 Port Seton 1
91 Methil 1
26 Craigrothie 1
101 North Queensferry 1
136 nr Dunbar 1

136 rows × 2 columns

In [79]:
px.scatter(town, x="town",y='number_of_times', color='number_of_times', size="number_of_times", size_max=60, title="Frequency of Performances per Town")

6.3.2 Frequency of Type tickets per town

In [80]:
town_type=df_town.groupby(['town', 'type']).size().reset_index()
town_type=town_type.rename(columns={0: "number_of_times"})
town_type=town_type[town_type["town"]!=""]
In [81]:
town_type=town_type.sort_values(by=['number_of_times'], ascending=False)
town_type
Out[81]:
town type number_of_times
117 Edinburgh Standard 85228
103 Edinburgh Concession 37160
100 Edinburgh Children 4004
104 Edinburgh Family 3161
354 St Andrews Standard 1175
... ... ... ...
230 Lasswade Standard 1
74 Dunbar Concession 1
295 North Queensferry Standard 1
294 North Berwick Under 5s 1
387 nr Dunbar Standard 1

386 rows × 3 columns

In [82]:
fig = px.scatter(town_type, x='town', y='type', color='number_of_times', title="Frequency of type tickets per town")
fig.show()
In [83]:
px.scatter(town_type, x="town",y='type', color='number_of_times', size="number_of_times", size_max=60, title="Frequency of performances type tickets per town")

6.3.3. Frequency of Max_Price tickets per towns

In [84]:
a=df_town[["town", "max_price"]]
a=a[a["town"]!=""]
town_price=a.groupby(['town', 'max_price']).size().reset_index()
town_price=town_price.rename(columns={0: "number_of_times"})
town_price=town_price.sort_values(by=['number_of_times'], ascending=False)
town_price
Out[84]:
town max_price number_of_times
66 Edinburgh 0.00 131214
399 St Andrews 0.00 2604
302 Kirkcaldy 0.00 956
425 Wilkieston 0.00 944
293 Kelso 0.00 941
... ... ... ...
215 Edinburgh 60.50 1
217 Edinburgh 62.89 1
219 Edinburgh 65.00 1
220 Edinburgh 70.00 1
428 nr Dunbar 0.00 1

429 rows × 3 columns

6.3.3.1. Frequency of free tickets per town

In [85]:
free_town_price=town_price[town_price["max_price"]== 0.0]
free_town_price
Out[85]:
town max_price number_of_times
66 Edinburgh 0.0 131214
399 St Andrews 0.0 2604
302 Kirkcaldy 0.0 956
425 Wilkieston 0.0 944
293 Kelso 0.0 941
... ... ... ...
27 Craigrothie 0.0 1
35 Currie 0.0 1
291 Juniper Geren 0.0 1
292 Juniper Green 0.0 1
428 nr Dunbar 0.0 1

133 rows × 3 columns

In [86]:
fig = px.bar(free_town_price, x='town', y='number_of_times', color='number_of_times', barmode='group', title="Frequency of Free Tickets per Town")
fig.show()

6.3.3.1. Frequency of No free tickets per town

In [87]:
town_price=town_price[town_price["max_price"]!= 0.0]
town_price
Out[87]:
town max_price number_of_times
92 Edinburgh 10.00 291
102 Edinburgh 12.00 161
161 Edinburgh 30.00 138
77 Edinburgh 7.00 138
87 Edinburgh 9.00 130
... ... ... ...
213 Edinburgh 59.50 1
215 Edinburgh 60.50 1
217 Edinburgh 62.89 1
219 Edinburgh 65.00 1
220 Edinburgh 70.00 1

296 rows × 3 columns

In [88]:
fig = px.bar(town_price, x='town', y='max_price', color='number_of_times', barmode='group', title="Frequency of Price Tickets per Town")
fig.show()
In [89]:
town_price.groupby(["town"]).sum().sort_values(by=['max_price'], ascending=False)
Out[89]:
max_price number_of_times
town
Edinburgh 9091.09 2663
Linlithgow 331.65 13
St Andrews 217.50 24
Dalkeith 213.82 12
Peebles 208.00 4
Lothianburn 149.27 4
Haddington 103.00 13
South Queensferry 101.60 15
Musselburgh 94.40 5
Dunfermline 90.25 10
Kelso 77.00 5
Galashiels 74.00 13
Penicuik 69.07 2
Dirleton 55.00 3
Duns 55.00 2
Gorebridge 53.08 2
Innerleithen 51.76 4
Pencaitland 50.00 3
Dunbar 49.00 2
Glenrothes 46.45 2
Livingston 45.50 97
Scottish Borders 43.50 3
Kirkcaldy 43.25 4
Wallyford 42.19 109
Selkirk 38.00 5
Hawick 35.00 2
Eyemouth 31.00 12
Newtongrange 30.00 4
St Monans 25.00 2
Lochgelly 23.00 5
Prestonpans 20.00 1
Anstruther 20.00 3
East Linton 20.00 1
Leith 18.00 1
Falkland 18.00 2
North Berwick 16.00 1
Newport on Tay 15.28 1
Livingston village 15.00 10
Cupar 12.00 1
Crail 12.00 1
Melrose 10.00 1
Jedburgh 10.00 1
Lauder 10.00 1
Armadale 5.00 1

6.4 Selecting Scottish Cities: Edinburgh, Glasgow, Dundee, Perth, Inverness, Aberdeen, St Andrews

6.4.1 Frequency of Price Tickets per Scottish City

In [90]:
scot_towns_price=town_price[town_price['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
In [91]:
scot_towns_price[0:10]
Out[91]:
town max_price number_of_times
92 Edinburgh 10.0 291
102 Edinburgh 12.0 161
161 Edinburgh 30.0 138
77 Edinburgh 7.0 138
87 Edinburgh 9.0 130
81 Edinburgh 8.0 100
75 Edinburgh 6.0 98
117 Edinburgh 15.0 89
97 Edinburgh 11.0 79
94 Edinburgh 10.5 65
In [92]:
fig = px.bar(scot_towns_price, x='town', y='max_price', color='number_of_times', barmode='group', title="Frequency of Price Tickets per Scottish City")
fig.show()
In [93]:
scot_towns_price.groupby(["town"]).sum().sort_values(by=['max_price'], ascending=False)
Out[93]:
max_price number_of_times
town
Edinburgh 9091.09 2663
St Andrews 217.50 24

6.4.2 Frequency of Type Tickets per Scottish City

In [94]:
scot_towns_type=town_type[town_type['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
In [95]:
scot_towns_type[0:10]
Out[95]:
town type number_of_times
117 Edinburgh Standard 85228
103 Edinburgh Concession 37160
100 Edinburgh Children 4004
104 Edinburgh Family 3161
354 St Andrews Standard 1175
351 St Andrews Children 686
355 St Andrews Students 645
119 Edinburgh Students 355
108 Edinburgh Members 146
113 Edinburgh Previews 68
In [96]:
fig = px.bar(scot_towns_type, x='town', y='number_of_times', color='type', barmode='group', title="Frequency of Type Tickets per Scottish City")
fig.show()
In [97]:
scot_towns_type.groupby(["town"]).sum()
Out[97]:
number_of_times
town
Edinburgh 130608
St Andrews 2577
In [98]:
df_place.loc[0]
Out[98]:
event_id                                                     194419
event_name                                           Martin Simpson
descriptions_x                                                  NaN
event_tags                 [Blues, Folk, Folk & world, Jazz, Music]
place_id                                                        386
start_ts                                  2018-05-07T20:00:00+01:00
end_ts                                    2018-05-07T20:00:00+01:00
0                                                               NaN
currency                                                        GBP
description                                                     NaN
max_price                                                       0.0
min_price                                                      11.0
type                                                       Standard
address                                         10 Cambridge Street
email                                                           NaN
postal_code                                                 EH1 2ED
properties        {'place.facilities.free-wifi': True, 'place.fa...
sort_name                                          Traverse Theatre
town                                                      Edinburgh
website                                   http://www.traverse.co.uk
modified_ts                                    2020-02-20T15:11:56Z
created_ts                                     2020-02-20T15:11:56Z
name                                               Traverse Theatre
loc               {'latitude': '55.94770000', 'longitude': '-3.2...
country_code                                                     GB
tags                                 [Cafes, New Writing, Theatres]
descriptions_y    [{'type': 'description.list.default', 'descrip...
phone_numbers     {'info': '0131 228 1404', 'box_office': '0131 ...
status                                                         live
Name: 0, dtype: object

6.4.3.3 Frequency of Schedules Dates per Event and per Scottish City

In [99]:
df_place2=df_place.dropna(subset=['town'])
df_place2
df_scott=df_place2[df_place2['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
df_scott=df_scott[["event_id", "event_name", "event_tags", "town", "start_ts", "end_ts"]]
df_scott[0:3]
Out[99]:
event_id event_name event_tags town start_ts end_ts
0 194419 Martin Simpson [Blues, Folk, Folk & world, Jazz, Music] Edinburgh 2018-05-07T20:00:00+01:00 2018-05-07T20:00:00+01:00
1 99847 Old Blind Dogs [Folk, Music] Edinburgh 2018-10-08T20:00:00+01:00 2018-10-08T20:00:00+01:00
2 115149 Tommy Smith Youth Jazz Orchestra [Blues, Jazz, Music] Edinburgh 2018-05-19T14:30:00+01:00 2018-05-19T14:30:00+01:00

Note: An event can have several schedules. And a schedule has an starting and end date. Therefore, an event can have several starting and end dates.

In [100]:
fig = px.scatter(df_scott, x='start_ts', y="event_name", title="Frequency of starting date per event in Scottish cities")
fig.show()
In [101]:
fig = px.scatter(df_scott, x='end_ts', y="event_name", title="Frequency of ending date per event in Scottish cities")
fig.show()

6.4.4 Grouping Schedules per Event and Scottish City

In [102]:
scott_schedule=df_scott.groupby(['event_name', 'town']).size().reset_index()
scott_schedule=scott_schedule.rename(columns={0: "number_of_times"})
scott_schedule=scott_schedule.sort_values(by=['number_of_times'], ascending=False)
scott_schedule
Out[102]:
event_name town number_of_times
4724 Mercat Tours: Historic Underground Edinburgh 2944
4725 Mercat Tours: Secrets of Edinburgh's Royal Mile Edinburgh 2208
4721 Mercat Tours: Ghostly Underground Edinburgh 2208
4720 Mercat Tours: Evening of Ghost and Ghouls Edinburgh 2208
6603 St Andrews Ghost Tours St Andrews 1932
... ... ... ...
4577 Mark Hendry Large Ensemble Edinburgh 1
6558 Spank presents: David Elders All Night Long Edinburgh 1
2359 Edinburgh - Solar Sons - Electric Mother - Sup... Edinburgh 1
2360 Edinburgh - Volcanova Edinburgh 1
3169 Guisers Galore Edinburgh 1

8560 rows × 3 columns

This means that the "Mercat Tours: Historic Underground" event has been scheduled 2944 times in Edinburgh

In [103]:
t=scott_schedule.groupby(["event_name"]).sum().sort_values(by=['number_of_times'], ascending=False)
t
Out[103]:
number_of_times
event_name
Mercat Tours: Historic Underground 2944
Mercat Tours: Evening of Ghost and Ghouls 2208
Mercat Tours: Secrets of Edinburgh's Royal Mile 2208
Mercat Tours: Ghostly Underground 2208
St Andrews Ghost Tours 1932
... ...
Departures 1
Denzil Meyrick 1
Denial 1
Demystifying Access: How to Create Better Access for Audiences 1
The Bluff, Heave and Hillside of Arthur's Seat: A Poetry Hike 1

8532 rows × 1 columns

This means that the "St Andrews Ghost Tours" event has been scheduled 1932 times

In [104]:
fig = px.bar(t, title="Frequency of Schedules per event")
fig.show()

6.4.5 Exploring Tags per Schedule and Scottish Cities.

In [105]:
a=df_scott.reset_index(drop=True)
tags_town=a[["event_tags", "town"]]
tags_town=tags_town.explode("event_tags")
tags_town
Out[105]:
event_tags town
0 Blues Edinburgh
0 Folk Edinburgh
0 Folk & world Edinburgh
0 Jazz Edinburgh
0 Music Edinburgh
... ... ...
136503 Talks & Lectures St Andrews
136504 Clubs Edinburgh
136504 Cheesy Dance Edinburgh
136504 EDM Edinburgh
136504 Pop Edinburgh

366279 rows × 2 columns

In [106]:
scott_tag=tags_town.groupby(['town', 'event_tags']).size().reset_index()
scott_tag=scott_tag.rename(columns={0: "number_of_times"})
scott_tag=scott_tag.sort_values(by=['number_of_times'], ascending=False)
scott_tag
Out[106]:
town event_tags number_of_times
156 Edinburgh Comedy 52865
731 Edinburgh Theatre 43353
188 Edinburgh Days out 23636
380 Edinburgh Kids 16349
677 Edinburgh Storytelling 15612
... ... ... ...
792 Edinburgh Whisky masterclass 1
793 Edinburgh Whisky tasting 1
446 Edinburgh Mozart 1
444 Edinburgh Mountaineering 1
965 St Andrews wine tasting 1

966 rows × 3 columns

This means that we have 52865 schedules tagged as Comedy in Edinburgh

In [128]:
fig=px.histogram(scott_tag, x="town", y="number_of_times", histfunc="sum", color="event_tags", title='Frequency of tags in Scottish Cities')
fig.update_layout(legend_traceorder="reversed")
fig.show()
In [108]:
t=scott_tag.groupby(["event_tags"]).sum().sort_values(by=['number_of_times'], ascending=False)
t
Out[108]:
number_of_times
event_tags
Comedy 52894
Theatre 43472
Days out 25670
Kids 16579
Storytelling 15621
... ...
Support 1
Survival Skills 1
Local produce 1
Synth 1
Maya the Bee 1

861 rows × 1 columns

This means that we have 52894 schedules tagged as Comedy in a Scottish city

6.4.5.1 Exploring the Frequency of schedules tags for Edinburgh

In [109]:
edi_scott_tag=scott_tag[scott_tag['town'].isin(["Edinburgh"])]
edi_scott_tag
Out[109]:
town event_tags number_of_times
156 Edinburgh Comedy 52865
731 Edinburgh Theatre 43353
188 Edinburgh Days out 23636
380 Edinburgh Kids 16349
677 Edinburgh Storytelling 15612
... ... ... ...
454 Edinburgh NYOS 1
792 Edinburgh Whisky masterclass 1
793 Edinburgh Whisky tasting 1
446 Edinburgh Mozart 1
444 Edinburgh Mountaineering 1

837 rows × 3 columns

In [110]:
edi_scott_tag.groupby(["event_tags"]).sum().sort_values(by=['number_of_times'], ascending=False)
Out[110]:
number_of_times
event_tags
Comedy 52865
Theatre 43353
Days out 23636
Kids 16349
Storytelling 15612
... ...
Mary Poppins 1
Support 1
Survival Skills 1
Marillion 1
Masterclass 1

837 rows × 1 columns

In [111]:
fig = px.bar(edi_scott_tag, x='town', y='number_of_times', color='event_tags', barmode='group', title="Frequency of schedules tags for Edinburgh")
fig.show()

6.4.6 Histograms of starting/end schedules dates for Edinburgh

In [112]:
scott_start=df_scott.groupby([pd.to_datetime(df_scott['start_ts']), "town"]).size().reset_index()
scott_start=scott_start.rename(columns={0: "number_of_times"})
scott_start=scott_start.sort_values(by=['number_of_times'], ascending=False)
scott_start.reset_index()
Out[112]:
index start_ts town number_of_times
0 3 2018-05-01 10:00:00+01:00 Edinburgh 4964
1 6 2018-05-01 11:00:00+01:00 Edinburgh 3255
2 10 2018-05-01 13:00:00+01:00 Edinburgh 2402
3 15 2018-05-01 19:00:00+01:00 Edinburgh 2340
4 358 2018-05-24 16:00:00+01:00 St Andrews 1932
... ... ... ... ...
4208 463 2018-05-31 10:30:00+01:00 St Andrews 1
4209 3550 2018-09-18 20:00:00+01:00 Edinburgh 1
4210 989 2018-07-02 11:00:00+01:00 Edinburgh 1
4211 3552 2018-09-19 10:15:00+01:00 Edinburgh 1
4212 4212 2018-10-31 23:00:00+00:00 Edinburgh 1

4213 rows × 4 columns

In [113]:
ed_scott_start=scott_start[scott_start['town'].isin(["Edinburgh"])].reset_index()
ed_scott_start.groupby(["start_ts"]).sum().sort_values(by=['number_of_times'], ascending=False)
#fig = px.bar(ed_scott_start, x='town', y='number_of_times', color='start_ts', barmode='group', title="Frequency of starting date schedules for Edinburgh")
#fig.show()
Out[113]:
index number_of_times
start_ts
2018-05-01 10:00:00+01:00 3 4964
2018-05-01 11:00:00+01:00 6 3255
2018-05-01 13:00:00+01:00 10 2402
2018-05-01 19:00:00+01:00 15 2340
2018-05-01 19:30:00+01:00 16 1513
... ... ...
2018-08-16 14:30:00+01:00 2702 1
2018-06-19 18:00:00+01:00 703 1
2018-06-19 12:45:00+01:00 701 1
2018-06-18 19:30:00+01:00 700 1
2018-10-31 23:00:00+00:00 4212 1

3997 rows × 2 columns

In [114]:
scott_end=df_scott.groupby([pd.to_datetime(df_scott['end_ts']), "town"]).size().reset_index()
scott_end=scott_end.rename(columns={0: "number_of_times"})
scott_end=scott_end.sort_values(by=['number_of_times'], ascending=False)
scott_end.reset_index()
ed_scott_end=scott_end[scott_end['town'].isin(["Edinburgh"])].reset_index()
ed_scott_end.groupby(["end_ts"]).sum().sort_values(by=['number_of_times'], ascending=False)
#fig = px.bar(ed_scott_end, x='town', y='number_of_times', color='end_ts', barmode='group', title="Frequency of ending date schedules for Edinburgh")
#fig.show()
Out[114]:
index number_of_times
end_ts
2018-10-31 10:00:00+00:00 4021 2982
2018-10-31 16:00:00+00:00 4032 2944
2018-10-31 21:00:00+00:00 4044 2260
2018-10-31 17:00:00+00:00 4033 2209
2018-10-31 15:00:00+00:00 4031 2208
... ... ...
2018-05-31 18:30:00+01:00 398 1
2018-05-31 18:00:00+01:00 397 1
2018-05-31 13:00:00+01:00 392 1
2018-05-30 18:30:00+01:00 383 1
2018-05-01 12:00:00+01:00 0 1

3833 rows × 2 columns

In [115]:
fig = px.histogram(ed_scott_start, x='start_ts', y="number_of_times", title="Histogram of Schedules Starting Dates for Edinburgh")
fig.show()
In [116]:
fig = px.histogram(scott_start, x='start_ts', y="number_of_times", title="Histogram of Schedules Starting Dates for Scottish Cities")
fig.show()
In [117]:
fig = px.histogram(scott_end, x='end_ts', y="number_of_times", title="Histogram of Schedules Ending Dates for Scottish Cities")
fig.show()
In [118]:
fig = px.histogram(scott_end, x="end_ts", y="number_of_times", histfunc="sum", title="Histogram on Date Axes")
fig.update_traces(xbins_size="M1")
fig.update_xaxes(showgrid=True, ticklabelmode="period", dtick="M1", tickformat="%b\n%Y")
fig.update_layout(bargap=0.1)
fig.add_trace(go.Scatter(mode="markers", x=scott_end["end_ts"], y=scott_end["number_of_times"], name="daily"))
fig.show()

6.4.7 Working with Schedule tags, Scottish cities, Starting/End Time

In [119]:
b=df_scott.reset_index(drop=True)
tag_town_time=b[["event_tags", "town", "start_ts", "end_ts"]]
tag_town_time=tag_town_time.explode("event_tags")
tag_town_time
Out[119]:
event_tags town start_ts end_ts
0 Blues Edinburgh 2018-05-07T20:00:00+01:00 2018-05-07T20:00:00+01:00
0 Folk Edinburgh 2018-05-07T20:00:00+01:00 2018-05-07T20:00:00+01:00
0 Folk & world Edinburgh 2018-05-07T20:00:00+01:00 2018-05-07T20:00:00+01:00
0 Jazz Edinburgh 2018-05-07T20:00:00+01:00 2018-05-07T20:00:00+01:00
0 Music Edinburgh 2018-05-07T20:00:00+01:00 2018-05-07T20:00:00+01:00
... ... ... ... ...
136503 Talks & Lectures St Andrews 2018-10-23T19:30:00+01:00 2018-10-23T19:30:00+01:00
136504 Clubs Edinburgh 2018-10-27T22:30:00+01:00 2018-10-27T22:30:00+01:00
136504 Cheesy Dance Edinburgh 2018-10-27T22:30:00+01:00 2018-10-27T22:30:00+01:00
136504 EDM Edinburgh 2018-10-27T22:30:00+01:00 2018-10-27T22:30:00+01:00
136504 Pop Edinburgh 2018-10-27T22:30:00+01:00 2018-10-27T22:30:00+01:00

366279 rows × 4 columns

In [120]:
scott_tag_end=tag_town_time.groupby([pd.to_datetime(tag_town_time['end_ts']), "event_tags"]).size().reset_index()
scott_tag_end=scott_tag_end.rename(columns={0: "number_of_times"})
scott_tag_end=scott_tag_end.sort_values(by=['number_of_times'], ascending=False)


scott_tag_start=tag_town_time.groupby([pd.to_datetime(tag_town_time['start_ts']), "event_tags"]).size().reset_index()
scott_tag_start=scott_tag_start.rename(columns={0: "number_of_times"})
scott_tag_start=scott_tag_start.sort_values(by=['number_of_times'], ascending=False)
In [121]:
scott_tag_start
Out[121]:
start_ts event_tags number_of_times
34 2018-05-01 10:00:00+01:00 Days out 3615
71 2018-05-01 11:00:00+01:00 Days out 3141
85 2018-05-01 11:00:00+01:00 Walks 3128
74 2018-05-01 11:00:00+01:00 History 2983
75 2018-05-01 11:00:00+01:00 Kids 2957
... ... ... ...
11413 2018-08-15 19:20:00+01:00 Cabaret 1
11404 2018-08-15 18:30:00+01:00 Visual art 1
11402 2018-08-15 18:30:00+01:00 Talks & Lectures 1
11400 2018-08-15 18:30:00+01:00 Painting & Drawing 1
17659 2018-10-31 23:00:00+00:00 House 1

17660 rows × 3 columns

6.4.7.1 Frequency of schedules Starting Date in Scottish City

In [122]:
#fig = px.bar(scott_tag_start, x='event_tags', y='start_ts', color='number_of_times', barmode='group', title="Frequency of schedules tags per Scottish City")
#fig.show()

fig = px.scatter(scott_tag_start, x='start_ts', y='number_of_times', title="Frequency of schedules Starting Date in Scottish City.")
fig.show()

6.4.7.2 Frequency of schedules Ending Date in Scottish City

In [123]:
fig = px.scatter(scott_tag_end, x='end_ts', y='number_of_times', title="Frequency of schedules Ending Date in Scottish City.")
fig.show()

6.4.7.3 Scheduled tags and Starting Dates in Scottish City

In [124]:
fig = px.scatter(scott_tag_start, x='start_ts', y='event_tags', title="Scheduled Tags and Starting Dates in Scottish City.")
fig.show()

6.4.7.3 Scheduled Tags and Ending Dates in Scottish City

In [125]:
fig = px.scatter(scott_tag_end, x='end_ts', y='event_tags', title="Scheduled Tags and Ending Dates in Scottish City.")
fig.show()